Improving Entity Retrieval on Structured Data

نویسندگان

  • Besnik Fetahu
  • Ujwal Gadiraju
  • Stefan Dietze
چکیده

The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper, we propose a two-fold entity retrieval approach. In a first, offline preprocessing step, we cluster entities based on the x–means and spectral clustering algorithms. In the second step, we propose an optimized retrieval model which takes advantage of our precomputed clusters. For a given set of entities retrieved by the BM25F retrieval approach and a given user query, we further expand the result set with relevant entities by considering features of the queries, entities and the precomputed clusters. Finally, we re-rank the expanded result set with respect to the relevance to the query. We perform a thorough experimental evaluation on the Billions Triple Challenge (BTC12) dataset. The proposed approach shows significant improvements compared to the baseline and state of the art approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward Entity Retrieval over Structured and Text Data

Many real-world applications increasingly involve both structured data and text. Hence, managing both in an efficient and integrated manner has received much attention from both the IR and database communities. To date, however, little research has been devoted to semantic issues in the integration of text and data. In this paper we introduced a problem in this realm: entity retrieval. Given da...

متن کامل

SIREn: Entity Retrieval System for the Web of Data

We present ongoing work on the Semantic Information Retrieval Engine (SIREn), an “entity retrieval system” specifically designed to meet the requirements of indexing and searching a large amount of semi-structured data, e.g. the entire Web of Data. SIREn supports efficient full text search with semi-structural queries and exhibits a concise index, constant time updates and inherits Information ...

متن کامل

Entity Retrieval over Structured Data

Entity retrieval is the problem of finding information about a given real-world entity (e.g., director Peter Jackson) from one or a set of data sources. This problem is fundamental in numerous data management settings, but has received little attention. We define the general entity retrieval problem, then discuss the limitations of current information systems (e.g., relational databases, search...

متن کامل

Steganography Scheme Based on Reed-Muller Code with Improving Payload and Ability to Retrieval of Destroyed Data for Digital Images

In this paper, a new steganography scheme with high embedding payload and good visual quality is presented. Before embedding process, secret information is encoded as block using Reed-Muller error correction code. After data encoding and embedding into the low-order bits of host image, modulus function is used to increase visual quality of stego image. Since the proposed method is able to embed...

متن کامل

Toward Structured Retrieval in Semi-structured Information Spaces

A semi-structured information space consists of multiple collections of textual documents containing fielded or tagged sections. The space can be highly heterogeneous, because each collection has its own schema, and there are no enforced keys or formats for data items across collections. Thus, structured methods like SQL cannot be easily employed, and users often must make do with only full-tex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015